perm filename FACIL.PUB[NSF,MUS] blob
sn#096510 filedate 1974-04-10 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00006 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .PAGE←93
C00010 00003 %A
C00023 00004 .NEXT PAGE
C00027 00005 %1
C00040 00006 %A
C00048 ENDMK
C⊗;
.PAGE←93
.NEXT PAGE
%B
.BEGIN CENTER
V. RESEARCH FACILITIES
A. EXISTING FACILITIES
.END
%A
1. HARDWARE FACILITIES AT THE STANFORD AI LAB
.BEGIN FILL ADJUST
.SELECT 1
All of our research to date has been done at the Stanford Artificial
Intelligence laboratory.
The facilities include the following:
.END
.SELECT C
.NARROW 6,6
Central Processors:
Digital Equipment Corporation PDP-10
and PDP-6
Primary Store:
65K words of 1.7 microsecond DEC Core
65K words of 1 microsecond Ampex Core
131K words of 1.6 microsecond Ampex Core
Swapping Store:
Librascope disk (5 million words, 22
million bits/second transfer rate)
File Store:
IBM 3330 disc file, 6 spindles (leased)
Peripherals:
4 DECtape drives, 2 mag tape drives
(7 channel IBM compatible), line printer,
Calcomp plotter, Xerox Graphics Printer
Communications Processor:
BBN IMP (Honeywell DDP-516) connected to
the ARPA network
Terminals:
58 Television displays, 6 Information
International vector displays, 3 IMLAC
displays, 15 teletype terminals, 4
Texas Instrument terminals
Satellite Processors:
Digital Equipment Corporation PDP-11/45,
Signal Processing Systems 41.
Satellite Memory:
96K 16-bit words of Intel MOS memory
Audio Equipment:
14-Bit 50Kc Analog to Digital converter
with 4 channel multiplexing, 16-bit
200Kc Digital to Analog converter with
4 channel multiplexing, Scully 4-track
tape recorder, 4 dolby system A noise
reducers, Sony 4-track tape recorder,
2 Dynaco stereo 70 watt amplifiers,
4 Altec 804 monitor speakers
Special Equipment:
4 television computer-controlled cameras,
2 mechanical arms, remotely computer-controlled
vehicle, laser depth ranging device
.WIDEN
%1
.BEGIN FILL ADJUST
The bulk of this equipment has been purchased through contract
research supported by the Advanced Research Projects Agency, with
some additional funds provided by the National Science Foundation,
and the National Institute of Mental Health.
The analog audio equipment (tape recorders, Dolby unit, etc) is owned
by the Stanford Department of Music.
The facility is organized as an interactive time-sharing display-oriented
system. Powerful graphical and interactive techniques are available as
well as a host of computer languages, library functions, utility routines,
and text manipulation programs. The system is well human-engineered so as
to make research convenient and natural.
The laboratory itself serves Stanford professors, research associates,
and graduate students, principally in the field of Artificial Intelligence
research, but with other concerns, such as Mathematical Theory of Computation,
Computer aids to Autistic children, and several others. It is a delightful
place to work due to the very diverse interests and skills of the participants,
as well as the computational power of the facility.
It should be mentioned here that the addition of the 16-bit D/A converter
and the 14-bit A/D converter, replacing the previous 12-bit units, was
done just this past year. The system is a conventional low-noise, high precision
conversion system but has one novel feature. The digital to analog converter
has a mode in which it accepts 9-bit bytes of compressed data from the
PDP-10 and from this recovers the 16-bit sample. The compression method
was discovered by simulating many different compression schemes. The
method which provided the most reduction with the least distortion was
the floating-point incremental method. The results using this method
are indistinguishable from direct 16-bit conversion to even the most
discerning of ears. The encoding consists, roughly, of taking the difference
between consecutive samples, making that into a floating-point number,
and truncating the exponent to 4 bits and the mantissa to 5 bits. The exponent
and mantissa together form a 9-bit byte which is sent to the D/A conversion
unit, where the floating number is first fixed and then added into a 16-bit
register which is then used as the input to a 16-bit digital-analog conversion
module. The encoding method is actually slightly more complex than this,
in that we do not subtract consecutive samples, but instead subtract the
predicted state of the 16-bit register in the conversion unit from the
current sample. This keeps errors from accumulating.
Our experiments show that this could be reduced to an 8-bit byte (4 bits
exponent, 4 bits mantissa) without loss of fidelity.
.END
.GROUP SKIP 2
%A
2. EXISTING RESEARCH SUPPORT SOFTWARE
%1
.BEGIN FILL ADJUST
In the course of our research, we have developed a number of
original and useful programs. It is beyond the scope of this proposal
to give a complete description, or even a complete list of all the
programs we have written. Some of the principal ones are briefly
listed below:
.END
%C
GENERAL-PURPOSE PROGRAMS
%1
.BEGIN FILL ADJUST
The heart of our experimental sound generation is a highly flexible
acoustical compiler which is an ALGOL-based
descendant of Bell Laboratory's MUSIC V system. This
program enables us to produce sounds by any known synthesis technique
as well as experiment with new techniques. We have been able to
program versatile reverberation routines, spacial localizing routines,
and many others.
To connect the synthesis programs with the audio system, there is a
set of programs which communicate with that system. For playing synthetic
tones, there are programs which can send a disk file to the digital-analog
converter for any combination of available sampling rates and channel
selections. Monaural, stereo, and quadraphonic files can be played and
recorded on any of the tape recorders. Natural sounds can be digitized
and stored on disk files. There are sound file editors which can display
a segment of a digitized waveform, synthetic or natural, which can display
the discrete Fourier transform of a segment of sound, select sub-segments
from longer sound files, and many other splicing and editing functions.
A general purpose function generating program allows the user to specify
the Fourier components of a complex wave, with control over amplitude
and phase of each of the components. In addition, the program is
used to generate time-domain functions by means of numerical specification
or by use of a light-pen. The program stores all functions as a file
on the disk memory.
There is a sampling rate conversion program which can convert between
any two sampling rates that are rational multiples of each other. It
selects the proper low-pass filter and accomplishes the conversion with
minimum distortion.
.GROUP SKIP 2
%C
PROGRAMS FOR REVERBERATION AND LOCALIZATION RESEARCH
%1
We have written an interactive reverberator compiler which aids the
researcher in the design of a compound reverberator. The program calculates
reverberation parameters, generates and displays the resultant program language
description of the reverberator as it is being constructed, displays and
plays (on command) the impulse response of the reverberator as each unit
reverberator is defined, and reverberates an existing sound file.
An interactive graphic program has been developed for the control of
a moving sound source trajectory in the simulated two-dimensional
space. The program accepts input from the teletype
to specify a computed trajectory, or from the light-pen to specify
a `hand' determined trajectory. The program derives the control functions
for azimuth, distance, and velocity, displays them, and stores them
on the disk, on command.
.NEXT PAGE
%C
PROGRAMS FOR MUSIC INSTRUMENT RESEARCH
%1
A set of programs exist which implement the heterodyne analysis of
music instrument tones. The first pass includes FFT's for
overlapping windows, which are processed in the next pass to
determine average frequencies of the harmonics of the tone. In the
next pass, these frequencies set the heterodyne filter both in window
size, the fundamental period, and center frequency, the particular
harmonic being analyzed. The output is a set of time-varying
frequency and amplitude functions for each harmonic listed. The
final pass consists of a heuristic program which extracts, by an
examination of phase variation and amplitude, the analyzed sound
segment which most probably corresponds to the actual tone. It scans
the functions for new estimates of the average frequencies of the
harmonics, their peak amplitudes, and extracts various other
information on the tone. At this point, or above, data compression
by any power of two can be performed by an averaging technique.
Another set of programs are used to display, plot, and modify the
analyzed functions obtained above. Three-dimensional rotation of
perspective plots of the amplitude or frequency functions for the
whole set of harmonics, temporal line-spectrum displays,
spectrographic displays, and many other forms of graphics are
obtainable (as described above in Section IIA1). Furthermore, a host
of operations can be performed on the analyzed functions, including
smoothing, light-pen modification, line-segment approximation,
spectral envelope modification, amplitude and frequency modulation,
and many more which serve as tools for research.
Another program is used for optimized additive synthesis, based on
the directly analyzed or modified functions. Tones can be
synthesized at arbitrary frequencies, amplitudes and durations, from
any specified list of amplitude and frequency functions, paired as
desired. This allows for the generation of a tone with a reduced set
of harmonics, or a set of harmonics from different instrumental
origins, or the pairing of arbitrary amplitude and frequency
functions from different instruments or having differing operations
performed beforehand. Furthermore, procedures can be written for this
synthesis program which allow for further data manipulation, as in
the interpolation between two tones (described in Section IIA3). The
program is capable of reverberation, and accepts any arbitrary
note/procedure list which specifies a temporal ordering of events,
each having associated parameters for synthesis operations.
A dynamic display program is used for evaluating the results of
decisions in FM synthesis. The amplitude of the frequency components
are displayed as a function of time where the modulation index is
interpolated between any two values. The program asks for a ratio of
carrier to modulating frequencies, beginning and ending values of the
index, and the number of steps in the interpolation. The displayed
information includes the value of the index at every step, so that
the user can interrupt the interpolation to examine the spectral
shape in relation to the value of the index.
An equally useful representation of the same data exists in a three-
dimensional interactive display program. In this program, the effect
of complex functions for both amplitude and modulation index can be
examined as they affect the spectrum shape.
.GROUP SKIP 2
%C
PROGRAMS FOR PERCEPTUAL EXPERIMENTATION
%1
A whole host of programs exist which conduct on-line psychoacoustical
experiments, playing tones over the D-A converter, allowing listeners
to hear stimuli again, often in any order, and sometimes having
listeners directly to manipulate certain aspects of the signal. (It should
be mentioned that the possibilities afforded by the real-time system
which we propose for allowing listeners manipulate, and get immediate
feedback, aspects of the signal, would give us a much greater tool to
get at many of the most important aspects of the perception of instrument
tones.) Responses of listeners are taken, stored, and eventually analyzed
by other programs, e.g. analysis of variance and multidimensional scaling.
A particularly useful program exists for the display of the results from
multidimensional scaling experiments. Using a series of mirrors, it
allows the viewer to see convincing three-dimensional configurations
in realistic perspective. Rotation can be performed, and two configurations
can be observed and compared simultaneously.
.END
.NEXT PAGE
.SELECT B
.BEGIN CENTER
B. PROPOSED FACILITY
.END
%A
1. HARDWARE FACILITY
%1
.BEGIN FILL ADJUST
The process of analysis and synthesis as described in the preceeding
sections demands great quantities of computer time. The computations
can take as much as 100 times the length of the produced tone. This
requires great patience from the researcher. Since many of the results
are empirical, many experiments have been done and must continue to
be done. For this reason, we propose a special-purpose computing system.
The system would initially be a satellite to the AI lab, but would serve
not only to reduce the computation load of the PDP-10, but by means of
special-purpose hardware, actually synthesize tones in reverberant
environments in real time. The system would be powerful enough to stand
alone if necessary. Since it would be extremely expensive to purchase
a system with the human engineering of the AI lab facility, we propose
to use the AI lab facility as an interface to the special-purpose
system, thus minimizing change-over inconveniences and initial setup
price. As work continues, it would be possible to upgrade the system
to stand alone and eventually provide a high degree of service without
the aid of the AI facility. To this end, we propose using an existing
time-sharing system as the resident monitor in the special-purpose
system. This saves us the trouble of having to write device controllers,
memory management programs, and other system-level functions. Since
we propose to begin with a time-sharing system, upward compatibility
is assured. Programs will continue to run unmodified as the system is upgraded.
Although the detailed hardware budget is given in section V, we shall
discuss the main items here. The following is a list of the principal
componants of the proposed facility:
.END
.SELECT C
Digital Equipment Corporation PDP-11/45 Computer
with floating-point unit
and memory management module
Digital Equipment Corporation RP03 disc drive.
Provides 20 million 16-bit words of storage.
Systems Concepts signal processor with digital
reverberation module
Digital Equipment Corporation GT-40 graphics terminal
Audio equipment, including 8-track tape recorder,
quiet booth, acoustically treated room,
8-channel amplifiers and matched speakers
%1
.BEGIN FILL ADJUST
The heart of the system is a PDP-11/45 with floating-point processor
and memory management module. This is a powerful mini-computer capable
of high-speed arithmetic and basic time-sharing operation.
The purpose of such a computer is to provide a versatile test bed for
research. The floating-point processor greatly aids numerical computations
of the type which are so common in digital signal processing and multi-dimensional
scaling. The memory management module provides for time-multiplexing of
multiple tasks, so as to more effectively utilize the computer by running
processes while others are idle. It also provides for cooperating parallel
processes which aids decomposition of large tasks into smaller modules.
For bulk
storage, an RP03 disk system is included. This is necessary for
real-time operation.
The bulk of the disk is to store digitized audio and control information
for the signal processor. In analysing music instrument tones, we
digitize recorded natural tones to a precision of 14 bits (stored in
16-bit PDP-11 words) at rates up to 50,000 samples per second.
One can easily see that to store individual notes of each of the
orchestral instruments in all of their characteristic playing modes
would be a staggering amount of storage. The RP03 disk can provide
storage for up to 400 seconds of sound. This is a compromise between
price and working requirements, representing the minimum amount of
storage that can be effective and useful.
An alternative to getting an RP03 disk would be to use the AI
facility's disk storage. This is unwise because of the required
rate of data transfer between the machines and would require
a high-speed data channel for communication with the AI PDP-10.
The most efficient solution
is inclusion of some amount of bulk storage on the PDP-11 itself, and
for real-time operation, this is the only solution.
The most important item of all is the Systems Concepts Signal
Processor. This is a highly-parallel, special-purpose, programmable, digital
processor designed especially for the generation and processing of audio. Together
with the matching reverberation unit, it provides enough power and flexibility
to synthesize all the tones we have produced to date in real time, even with
spacial localization and 4-channel reverberation involved.
It can even do the computations in the heterodyne filter analysis in
real time.
The processor is designed to be controlled by a small computer and
a PDP-11 interface is a standard option. The utility of such a processor
can not be overemphasized.
.NEXT PAGE
When one attempts to generate sounds using a new technique, often there
is no good way to make an %5a priori%1 prediction on the range of the controlling
parameters. A good example of this is deciding over what range to sweep
the modulation index of an FM instrument. With our current turnaround time,
one must "shoot in the dark" in attempting to find the correct parameter, often
wasting tremendous amounts of computer time as well as personal time in
the process. The PDP-11/45 by itself offers no speedup, but combined with
the Systems Concepts Signal Processor, provides the solution to the problem.
It would be possible to directly connect a knob, via some PDP-11 support
software, to synthesis parameters, thus allowing the experimenter to directly
control the parameter as the sound is generated. This increases the
efficiency of the research process immensely. It also makes possible
experiments which would be otherwise impractical or even impossible. One
example of this would be a two-knob experiment where one knob controls
the duration of a note and one controls the loudness. To do this experiment
without real-time generation of the sound would require preparation of
a large number of sample sounds beforehand. With three knobs, the size of such
samples exceeds even the bounds of the entire AI lab bulk storage. It is
much more efficient to store a sound as the program which can synthesize
it rather than the digitized waveform itself.
A whole new domain of experiments immediately suggests itself, based on
interactive control of complex attributes of synthesized sound.
We wish to add to the signal processor the reverberation memory option.
This device provides for a number of variable-length digital delays
which are easily interfaced with the signal processor itself to provide
reverberation in all the forms we have realized to date, and have enough
generality to provide for any future forms of reverberation we may
discover. Again, the parameters of the reverberation could be easily
attached to knobs, giving the user direct real-time control over the
character of the reverberation. This would greatly enhance the productivity
of the researcher.
It should be mentioned again that the Signal Processor is of such generality
and power that it is easily capable of synthesizing speech using
the methods that are currently popular. Most speech synthesis is done
be taking an excitation function of some kind, often a pulse train for
voiced phonemes and white noise for frication, and applying spectral
shaping filters to produce formants. This is the idea of the
synthesis by linear prediction of the speech waveform [Atal].
Since the Signal Processor is capable of generating a large number of
excitation functions and realizing a large number of
digital filters all in real time
with complete freedom of interconnection,
it would seem to be capable of synthesizing
several voices, possibly as many as four, in real time. Although our research
is not directly concerned with speech synthesis, the techniques for programming
and controling the Signal Processor would certainly be applicable to the
synthesis of speech.
Again, such a device as the Systems Concepts Signal Processor
could be interfaced directly to the AI PDP-10, but
it requires real-time control of the type that is not generally available
in time-sharing systems.
The GT-40 display console is a general-purpose graphics terminal.
It provides a direct
graphical interface to the PDP-11 and thus to the Signal processor.
As was noted many times in the previous sections, the use of computer
graphics is an essential piece of human engineering that we take
advantage of constantly. The AI facility has made graphics highly
available and easily used. It thus has crept into many programs as
a debugging aid as well as a research aid.
The audio equipment is the final stage of the sound production. The
waveform must be converted to sound by an audio system whose quality
matches the extreme quality of digital synthesis. We propose to place
8 speakers in an acoustically treated room for listening tests. The number
8 is somewhat of a compromise with cost, as we have little %5a priori%1
reason to believe that 8 channels will be enough. Our only real evidence
is our success with 4 channels.
The importance of the listening room is
great. The environment of a computer facility is very noisy. Computers
require extensive cooling and air conditioning, making the computer room
sound somewhat like a continuous hurricane. In addition, the particular
building we are in uses forced-air cooling in each room, adding a gentle
but disturbing hiss to all offices. To do perceptual testing, it is
essential that the acoustical environment be controlled
entirely by the investigator.
A small quiet booth is also proposed for recording and analysing sounds
from real music instruments. The 8-track tape recorder provides off-line
storage of acoustical data. Audio tape is still the most economical way
to store large quantities of audio, at the cost of some loss of fidelity
and signal-to-noise ratio.
.END
.GROUP SKIP 2
%A
2. PROPOSED RESEARCH SUPPORT SOFTWARE
%1
.BEGIN FILL ADJUST
An amount software would have to be written to integrate
the proposed system into the current system.
This could be done in layers, each preserving compatibility
with the previous systems but each advancing the researcher's
capabilities. The following discussion attempts to summarize
the software which would have to be developed.
.END
%C
THE PDP-11 MONITOR
%1
.BEGIN FILL ADJUST
Although we intend to use the UNIX time-sharing system from Bell
Laboratories as a base, it will need some amount of modification
to deal with our specific set of peripherals: the GT-40, the RP03
disk, the PDP-10 interface, and the Signal Processor. It is not
expected that this would require a great deal of effort to get
a basic monitor up and running. Improvements and streamlining can
always be added later as the need arises.
.END
.NEXT PAGE
%C
THE SIGNAL PROCESSOR
%1
.BEGIN FILL ADJUST
Writing software for the Signal Processor presents somewhat of a
problem. One must provide methods not only for setting up the
internal computation flow, but also for synchronization and data
transfer with cooperating PDP-11 processes. One example might
be interactive control of mixing natural and synthesized sounds.
The PDP-11 would be reading analog to digital converter output,
interactive knob settings, as well as controlling the signal processor
and providing it with the parameters read from the knob and the
waveform from the A/D converter all at once. The theory of
cooperating parallel processes, however, is rather well developed
such that straightforward methods can be applied here with little difficulty.
It is a great advantage that the UNIX time-sharing system provides
for parallel processes and has a highly developed inter-process
comunication system.
Only slight modifications would have to be made to provide for real-time
interaction. The efficiency of the inter-process comunication system
would have to be examined and possibly reprogrammed to assure that
it does not incur prohibitive overhead.
One would write a program for the Signal Processor just as one
writes a program for a computer. At first, the language would be
a low-level one, corresponding to an assembler for a computer. As we
learn more about controling such a device, the language could be
upgraded to a higher level. One would specify communication with
other processes through a system of ports. At run time, the user
would be able to specify which ports should be connected to which
channels. It is then that the user would specify which communication
ports should be directed to which knobs, and which ports should be
connected to which disk files, and so on.
This maintains a level of generality which permits great flexibility
in testing and repeating a run.
At first, we would merely seek to construct an interface with our
existing PDP-10 synthesis programs such as to ease the change-over
problems. This would allow for direct synthesis of all the sounds
we have generated to date, and would allow research to proceed
at a much greater rate, but would not allow for real-time
interaction until the assembler for the Signal Processor was
completed.
Since many of the analysis routines, as well as synthesis routines,
could be executed on the Signal Processor, one would want ways to
send digitized waveforms to the Signal Processor. It should be
possible to come in directly from the analog-to-digital converter
as well as retrieve stored waveforms from the disk. The output of
the signal processor could be directed to the digital-to-analog
converter as well as directed to a disk file. The system
could also get at files on the AI Lab IBM 3330 disk, but only at
a limited rate.
This ability to store the output of the signal processor is quite
important. The most compelling reason is one of growth. As we progress,
it is quite possible that we will demand computations so complex that
even the Signal Processor can not complete them in real time. In this
case, we must have the option of directing the output to the disk. In
this manner, we could break up the computations into smaller runs and
mix them in the PDP-11 for later playback.
There are also many utility routines that would have to be written
for the PDP-11. We would need an implementation of the
fast Fourier transform algorithm, an intermediate-level graphics
package for convenience in displaying complex functions, routines
for getting at the digital-analog converters, as well as ways to
direct asynchronous processes in a uniform and convenient manner.
All of this is programming support that would be needed for
interfacing our current research to the new system. Eventually,
we would begin writing research programs directly on the PDP-11.
.END
.NEXT PAGE